Quick links (more details at introduction):

Assignment used the base docker image acessible here.

Chosen study: Serotonin-induced hyperactivity in SSRI-resistant major depressive disorder patient-derived neurons(Vadodaria et al., 2019).

GEO data set: GEO accession GSE125664(Vadodaria KC, 2019)

1. Introduction: A1 and A2 summary

1.1 Selection of an expression data set and software tools

Previously, in A1, we have found an appropriate study (in terms of RNA expression analysis and clinical relevance) by searching RNAseq data set available at [Gene Expression Omnibus repository - GEO] (www.ncbi.nlm.nih.gov/geo)(“Gene Expression Omnibus,” n.d.).

We used Geometadb(Zhu et al., 2008) to perform the search of GEO data sets and RSQLite(Müller et al., 2020) and GEOquerry(Davis & Meltzer, 2007) package to perform the queries using a downloaded copy of GEO contents’ metadata. These packages were installed, when needed through BiocManager(Morgan, 2019) functionality. knitr(Xie, 2021) package was later used to report data statistics and edgeR to aid on modeling and normalization of data.

After 4 search iterations we have chosen the following study: [Serotonin-induced hyperactivity in SSRI-resistant major depressive disorder patient-derived neurons] (https://pubmed.ncbi.nlm.nih.gov/30700803/)(Vadodaria et al., 2019) in which the authors tried to elucidate the common clinical challenge of patients suffering from Major Depressive Disorder (MDD) who fail or respond only partially to SSRI therapy (aka non-remitters). We currently have no biomarkers to identify these patients. Moreover, is not known the specific mechanism underpinning SSRI and similar SNRI-induced suicidality.

It also provided the raw mRNA counts for analysis. The controls are represented by expression data in forebrain neuron lines developed from Induced pluripotent stem cells (iPSCs) removed through non-CNS biopsy of healthy patients exposed to SSRIs. The test group involve the same cells from MDD patients which suffered from SSRI treatment resistance and MDD patients which responded to SSRI treatment.

The sample size in this study is 9, with adequate control and 2 testing groups (MDD with an without SSRI-based treatment resistance).

1.2. Downloading the data, cleaning and compute summary statistics

The data for this study is contained in one supplemental .csv file called “GSE125664_Vadodaria_MDDNeurons_RawCounts”:

Platform title: Illumina HiSeq 2500 (Homo sapiens)

Submission date: Mar 14 2013

Last update date: Mar 27 2019

Organisms: Homo sapiens

Snapshot of data first 10 mapped genes at this stage:

kable(SSRI_exp[1:10,1:10], format = "html")
gname H_neurons_1 H_neurons_2 H_neurons_3 NR_neurons_1 NR_neurons_2 NR_neurons_3 R_neurons_1 R_neurons_2 R_neurons_3
A1BG 220 219 205 211 202 134 270 163 220
A1BG-AS1 106 98 96 91 165 128 104 130 120
A1CF 0 0 1 0 0 0 0 0 1
A2M 9441 18 4 2541 336 488 2396 1584 72
A2M-AS1 9 12 9 22 36 5 26 31 7
A2ML1 11 0 1 6 53 23 21 10 1
A2MP1 4 0 0 1 0 0 0 1 0
A3GALT2 4 1 1 3 10 7 5 6 2
A4GALT 85 48 78 29 1 0 17 0 43
A4GNT 0 1 0 0 1 0 0 0 0
dim(SSRI_exp)
## [1] 22351    10

And we have 22351 genes with measures for 3 controls, 3 non-resistant-to-SSRI MDD patients and 3 resistant-to-SSRI MDD patients.

Counts for each gene show each gene with the frequency of exactly one, except for some miscoded names.

28 genes contained miscoded names (they expressed dates instead of valid identifiers) and we will have to remove them.

Each mRNA reading was already mapped to unique genes expressed as HUGO Gene Nomenclature Committee symbols) so we proceeded to calculate counts per million reads mapped (cmp). All the counts with less than 9 counts where removed and that removed 9350 gene observations. There after that, and after the cleaning the data, there were no duplicates and all the symbols where unique symbols that complied with HUGO Gene Nomenclature Committee. After all removals and manipulations, the dataset coverage was 0.59.

We have kept 12973 gene counts, meaning we had to filter out 0.4188505 of our gene-mapped RNA seq observations

1.3. Normalization

We used TMM (Trimmed Mean of M-values) normalization. For most samples the distribution curve pre and post-normalization is very similar, however there is a noticeable change towards the mean (a bit higher than 5) for the second control (H_neuron_2) and, interestingly, for 2 of the 3 nonremitter patients.

1.4. Differential Gene Expression analysis

Here, we test for statistically significant expression differences for each of the mapped genes, between our 3 groups (we will call it status) of interest in our designed model: healthy patients exposed to SSRI (control), non-remitters MDD exposed to SSRI (test 2) and remitter MDD exposed to SSRI (test 2).

An expansion of this model includes 2 groups (we will call this attribute “affected”), healthy (ND) and affected by disease (D), although this expansion wouldn’t add granularity to the assessment of different subtypes of MDD (refractive or not to treatment), it could help in increasing specificity to the bayes differential expression significance calculation.

We used 2 distinct but symmetrical models that will assess remitters MDD patient’s cells VS Healthy control cells and remitters MDD patient’s cells VS non-remitters MDD patient’s cells separately.

Now we apply the each model separately to the appropriate data set:

## [1] 2306
## [1] 0
## [1] 732
## [1] 0

1.5. Thresholded over-representation analysis

First we compared non-remitters and controls:

## up regulated genes:
length(which(qlf_output_hits_RNxH$table$PValue < 0.05
& qlf_output_hits_RNxH$table$logFC > 0))
## [1] 1345
## down regulated genes:
length(which(qlf_output_hits_RNxH$table$PValue < 0.05
& qlf_output_hits_RNxH$table$logFC < 0))
## [1] 961

We have 1345 genes significantly up-regulated in non-remitters comparing to controls, and 961 down-regulated.

Now We compared non-remitters and remitters:

## up regulated genes:
length(which(qlf_output_hits_RNxR$table$PValue < 0.05
& qlf_output_hits_RNxR$table$logFC > 0))
## [1] 379
## down regulated genes:
length(which(qlf_output_hits_RNxR$table$PValue < 0.05
& qlf_output_hits_RNxR$table$logFC < 0))
## [1] 353

We have 379 genes significantly up-regulated in non-remitters comparing to controls, and 353 down-regulated, which is compatible with finding on paper, where the delta of expression among MDD patients is decreased when compared to remitters and controls.

1.6 Creating thresholded list

Here we will create a thresholded list, initially with the remitters dataset compared to the non remitters dataset, in which our rank will consist in the following product: -log(p-value) * logFC. We then create 3 files to contain, separately:

  1. all significantly differentially expressed genes together
  2. the up-regulated significantly differentially expressed genes
  3. the down-regulated significantly differentially expressed genes

Now doing the same to compare non-remitters to remitters:

Save tables for later use if docker port connection is properly configured:

1.7 Thresholded Gene List Function Enrichment Analysis

We then used G:Profiler web server(Uku Raudvere, 2019)to perform the above gene list enrichment analysis (version e102_eg49_p15_7a9b4d6).

Thus we will be using the following annotation database in the enrichment:

  • GO biological process annotation release 2021-02-01 (with Ensembl version 102)
  • Reactome version 75 (December 2020)
  • Wikipathways (December 2020 release)

All electronic GO annotations were discarded and the annotation term size was restricted to 200 genes.

The tool uses Fisher Exact test. We used 0.05 threshold for significance and Benjamin-Hochberg correction for multiple hypothesis testing.

Please see details on the results at A2.

2. Non-thresholded Gene set Enrichment Analysis

We will now use the ranked gene list for the Non-remitters Vs remitters and Non-remitters Vs controls built on section 1.6 to perform the non-thresholded gene set enrichment analysis which correspond to RNxR_ranked_genelist.txt and RNxH_ranked_genelist.txt (as .rnk versions).

Then we proceed to perform the analysis using GSEA version 4.1.0, through their Java GUI(Tamayo, n.d.), pre-ranked option.

We used genesets from Baderlab Geneset Collection of February, 1st, 2021, containing GO biological process without inferred electronic annotations (v5.7).

We used Ensembl as the platform annotation for symbol mapping (HGNC Gene Symbol schema) as conveyed by their parameter Human_HGNC_ID_MSigDB.v7.4

The minimum geneset size was set to 15 and the maximum to 200. We used gene set permutation (1000 permutations) as the statistic method to assess significance. For reproducibility, the randomness generation seeds used were 1617549485725 1617549485725 for the Non-remitters Vs controls and Non-remitters Vs Remitters, respectively.

For the non-remitters Vs controls we have obtained the following results also available at NRxH_gsea_report_up.tsv and NRxH_gsea_report_down.tsv with output files at RNxH_analysis.GseaPreranked_seed1617549485725 folder):

1616 / 2917 gene sets are upregulated 399 gene sets are significantly enriched at nominal pvalue < 1% 598 gene sets are significantly enriched at nominal pvalue < 5%

1301 / 2917 gene sets are upregulated 223 gene sets are significantly enriched at nominal pvalue < 1% 369 gene sets are significantly enriched at nominal pvalue < 5%

Top 15 up regulated genes at Remitters Vs Controls:

Figure 1: upregulated GSEA results for NRxH

Interesting we many sets related to glutamatergic activity, such as the following (see discussion bellow)

Figure 2: Enrichment plot: GOBP Glutamatergic profile of the Running ES Score & Positions of GeneSet Members on the Rank Ordered List

Top 15 down regulated genes at Remitters Vs Controls:

Figure 3: downregulated GSEA results for NRxH

There, we see processes mostly related to the dedifferentiation and neuro-differentiation techniques used, as discussed in A2. Among them, processes related to protein membrane-targeting as expected due to transcription factors exposure (forebrain neuro differentiation):

Figure 4: Enrichment plot: GOBP Glutamatergic profile of the Running ES Score & Positions of GeneSet Members on the Rank Ordered List

Now we obtain the results for the analysis of the non-remitters Vs remitters profile using the same parameters (results also available at NRxR_gsea_report_up.tsv and NRxR_gsea_report_down.tsv with output files at RNxR_analysis.GseaPreranked_seed1617549485725 folder):

1229 / 2917 gene sets are upregulated 38 gene sets are significantly enriched at nominal pvalue < 1% 104 gene sets are significantly enriched at nominal pvalue < 5%

1688 / 2917 gene sets are downregulated
119 gene sets are significantly enriched at nominal pvalue < 1% 325 gene sets are significantly enriched at nominal pvalue < 5%

Top 15 upregulated genes at Non-remitters Vs remitters:

Figure 5: upregulated GSEA results for NRxR

Here we notice an over-representation of eye-development gene sets (cilia and retina development), also present at lower ranks on the up regulated results. As discussed in A2, this is probably an artifact resulting from the dedifferentiation technique iPSCs which was later coupled with SSRI exposure.

Figure 6: Enrichment plot: GOBP Protein localization to Cillium Profile of the Running ES Score & Positions of GeneSet Members on the Rank Ordered List List

Top 15 down regulated genes for NRxR analysis:

Figure 7: downregulated GSEA results for NRxR

Again, we see the most significant associations probably related to Induced pluripotent stem cells dedifferentiation technique, which was coupled with exposure to neuron differentiation technique in the study, thus, as discussed in A2, probably a technique artifact.

The results relating to cation transport regulation, however, were still relevant, even at the comparison of Non-remitters Vs remitters:

Figure 8: Enrichment plot GOBP Regulation of Cation Channel Profile of the Running ES Score & Positions of GeneSet Members on the Rank Ordered List

GSEA non-thresholded results were comparable in a non-straightforward fashion to g:profiler thresholded results. Many gene processes that were highly significant the GP-BP output of G:profiler were naturally present at the GSEA output, especially those related to cation transport and synaptic dynamics. However that comparison is not direct since GSEA uses the whole list of genes. Another reason why a direct comparison would be flawed is the fact that the most significant results at G:profiler were those related to more restricted but highly curated annotations, while in GSEA the wider annotations from GO-BP seemed more useful due to it negative power (more gene sets would potentially not be present in our expression dataset thus resulting in more robust results).

Interestingly processes that were not highly significant at G:Profiler non-thresholded analysis such as the ones related to glutamate signalling were highly present at the comparison of non-remitters with both controls and remitters (see interpretation) and pathways related to endogenous opiates (which were somewhat significant with the G:profiler analysis) where not relevant at GSE analysis.

3. Cytoscape Vizualization of GSEA

Here we will use Cytoscape (Ono, n.d.) v.3.8.2 with the Enrichment map application v.3.3.1, its supporting pipeline collection v.1.1.0 (March, 2021). In a second step, we also used the supporting extension AutoAnnotate v.1.3.3, clusterMaker2 v1.3.1 and WordCloud v3.1.1.

For the creation of the network, we used the GSEA results (see details in section 2), also available at RNxH_analysis.GseaPreranked_seed1617549485725 and RNxR_analysis.GseaPreranked_seed1617549485725 folders.

The following parameters were applied for creating and annotation:

FDR cutoff: 0.1 p-value cutoff: 1.0 NES: All No filtering Baderlab names parsed Data Set Edges Automatic (combining them would have no effect since we are performing separate analysis) Edge display cutoff as 0.375 for preliminary view and 0.7 for later presentation Metric used for edge display: Jaccard and Overlap combined, 50/50 Annotation with layout network using MLC clustering algorithm by similarity over 16 iterations with weak edge cutoff of 1e-15 and kept residue of 0.001 WordCloud with Adjacent Words for labeling with 3 maximum words limit

Remitters Vs Controls network (network available at NRxH_network.cif) contained 2917 nodes and 2515 edges (0.7 cutoff on similarity), as seen bellow.

Figure 9: Preliminary Macro-organization view of NRxH network

For comparison, notice the correspondent network for the Non-remitters Vs Remitters comparison, with network also available at NRxR_network.cif.

Figure 10: Preliminary Macro-organization view of NRxR network

Without manual curation, relevant aspects of the network are difficult to distinguish, as noticed bellow.

Figure 11: A central view of annotated NRxH network before manual adjustment

After manual manipulation, while still maintaining the nodes topology and relationships:

Figure 12: Central view of annotated NRxR network after careful manual adjustment

The we proceed to collapse the network into main themes. Again, without manual curation, even the thematic network is of difficult interpretation.

Figure 13: Theme collapsed summary for NRxR

Again, a careful manual focus and manipulation to focus on relevant sets (those potentially not related o iPSCs technique artifacts):

Figure 14: Theme collapsed summary focused on central hub for NRxR

Interestingly (see discussion for details and also section 2), the major themes of Glutamatergic metabolism and Calcineurin interplay continues to be relevant, even at the narrower comparison scale of non-remitters Vs remitters. The later does fit the model presented by the author at the main paper while the former expands its.

4. Pathway investigation and Discussion

Since we have noticed the presence of pathways and themes related to differential glutamatergic activity / calcineurin activity, we initially tried to load our ranked data into Wikipathways’ curated network. However, connections at the imported network and its pathway version were scarce.

Figure 15: NDMA-related central hub for NRxR

We then highlighted the NDMA receptor-related cluster (a central hub in our network) through the GO annotation previously attached and used GeneMania(Franz et al., 2018).

Figure 16: Relationship type and extent of NDMA-associated genes on NRxR dataset through GeneMania

Then we imported our ranked data to match the gene names on this NDMA-related network.

Figure 17: Level of regulation in NDMA-associated genes on NRxR dataset loaded to GeneMania, green(-), pink(+)

And the results strikingly support a glutamatergic (general) hyperactivity on SSRI-resistant VS SSRI-responsive comparison.

Now we visualize the same NMDA-related set protein-protein interactions through the STRINGS extension(Szklarczyk et al., 2018). We notice the interesting central position of DLG3, which encodes for a guanylate kinase with possible role in clustering glutamate receptor at excitatory synapsis and potentially in mental disorders[frank2011clustered].

Figure 18: Central network role of DKG3 (in yellow) in protein-protein interactions with NDMA-associated sets vizualized through STRINGS

Many of the results seen since ORA and now with the network analysis can unfortunately be related to side effects of the dedifferentiation and specialization techniques used by the authors (iPSCs)(George et al., 2005) (Brennand KJ, 2011). Also, the network approach has failed to demonstrate a significant presence of endogenous opiate deregulation that were considered possible during our previous over-representation analysis.

However the new network representation of our ranked analysis has confirmed a central hub of differentially expressed gene sets that share the following common themes: glutamatergic activity, calcium transport and localization and calcineurin activity.

The main purpose of the main paper authors was to contribute to the understanding of why certain patients are resistant to SSRI therapy(Vadodaria et al., 2019). In that sense, it is possibly relevant that those themes (and opposite to what happened to our endorphin considerations) are present at both the comparison of therapy-resistant patients VS controls and therapy resistant patients VS therapy responsive patients.

The results corroborate the author’s hypothesis that SSRI resistant patients present a pattern 5-HT-induced hyperactivity due to up regulated 5-HT2A and 5-HT7 receptors (proxied by deregulation of cellular calcium localization mechanisms). The integrated network/pathway analysis could have also added that such hyperactivity maybe related also to glutamatergic signalling or even calcineurin activity downstream of AMPA receptors. The role of both (calcineurin and NMDA/AMPA, less so for kainate receptors) in long term potential depression have been widely studied(Li et al., 2002) among many others. Glutamate metabolism in resistant and non-resistant Major depression patients have also been studied widely (Abdallah et al., 2014) but no definitive role or effective first-line pharmacological approach has been developed on those lines (The closest one, lamotrigine, having adjuvant role in clinical context).

That leads us to some considerations regarding the limits of any conclusions that could be drawn. First, it seems possible the the differential effects on glutamate/calcineurin signalling represent more noise than signal, in a context of non-specific widespread over-activity of cells exposed to SSRI in vitro. Secondly, we cannot rule out the iPSCs-formation techniques potential to be the primary drive of glutamate/calcineurin signalling changes.

5. References

Abdallah, C. G., Jiang, L., De Feyter, H. M., Fasula, M., Krystal, J. H., Rothman, D. L., Mason, G. F., & Sanacora, G. (2014). Glutamate metabolism in major depressive disorder. American Journal of Psychiatry, 171(12), 1320–1327.

Brennand KJ, et a. (2011). Modelling schizophrenia using human induced pluripotent stem cells. Nature, 473, 221–225.

Davis, S., & Meltzer, P. (2007). GEOquery: A bridge between the gene expression omnibus (geo) and bioconductor. Bioinformatics, 14, 1846–1847.

Franz, M., Rodriguez, H., Lopes, C., Zuberi, K., Montojo, J., Bader, G. D., & Morris, Q. (2018). GeneMANIA update 2018. Nucleic Acids Research, 46(W1), W60–W64. https://doi.org/10.1093/nar/gky311

Gene expression omnibus. (n.d.). In National Center for Biotechnology Information. U.S. National Library of Medicine. https://www.ncbi.nlm.nih.gov/geo/

George, A., Schmid, K. L., & Pow, D. V. (2005). Retinal serotonin, eye growth and myopia development in chick. Experimental Eye Research, 81(5), 616–625.

Li, S.-T., Kato, K., Tomizawa, K., Matsushita, M., Moriwaki, A., Matsui, H., & Mikoshiba, K. (2002). Calcineurin plays different roles in group ii metabotropic glutamate receptor-and nmda receptor-dependent long-term depression. Journal of Neuroscience, 22(12), 5034–5041.

Morgan, M. (2019). BiocManager: Access the bioconductor project package repository. https://CRAN.R-project.org/package=BiocManager

Müller, K., Wickham, H., James, D. A., & Falcon, S. (2020). RSQLite: ’SQLite’ interface for r.

Ono, K. (n.d.). In Cytoscape. https://cytoscape.org/

Szklarczyk, D., Gable, A. L., Lyon, D., Junge, A., Wyder, S., Huerta-Cepas, J., Simonovic, M., Doncheva, N. T., Morris, J. H., Bork, P., Jensen, L. J., & Mering, C. von. (2018). STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47(D1), D607–D613. https://doi.org/10.1093/nar/gky1131

Tamayo, P. (n.d.). Gene set enrichment analysis. In GSEA. UC San Diego. http://www.gsea-msigdb.org/gsea/index.jsp

Uku Raudvere, I. K., Liis Kolberg. (2019). G:Profiler: A web server for functional enrichment analysis and conversions of gene lists.

Vadodaria, K. C., Ji, Y., Skime, M., Paquola, A., Nelson, T., Hall-Flavin, D., Fredlender, C., Heard, K. J., Deng, Y., Le, A. T., & others. (2019). Serotonin-induced hyperactivity in ssri-resistant major depressive disorder patient-derived neurons. Molecular Psychiatry, 24(6), 795–807.

Vadodaria KC, G. F. (2019). Serotonin-induced hyperactivity in ssri-resistant major depressive disorder patient-derived neurons. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE125664

Xie, Y. (2021). Knitr: A general-purpose package for dynamic report generation in r. https://yihui.org/knitr/

Zhu, Y., Davis, S., Stephens, R., Meltzer, P. S., & Chen, Y. (2008). GEOmetadb: Powerful alternative search engine for the gene expression omnibus. Bioinformatics (Oxford, England), 24(23), 2798–2800. https://doi.org/10.1093/bioinformatics/btn520